Understanding why a model makes certain predictions is crucial when adapting it for real world decision making. LIME is a popular model-agnostic feature attribution method for the tasks of classification and regression. However, the task of learning to rank in information retrieval is more complex in comparison with either classification or regression. In this work, we extend LIME to propose Rank-LIME, a model-agnostic, local, post-hoc linear feature attribution method for the task of learning to rank that generates explanations for ranked lists. We employ novel correlation-based perturbations, differentiable ranking loss functions and introduce new metrics to evaluate ranking based additive feature attribution models. We compare Rank-LIME with a variety of competing systems, with models trained on the MS MARCO datasets and observe that Rank-LIME outperforms existing explanation algorithms in terms of Model Fidelity and Explain-NDCG. With this we propose one of the first algorithms to generate additive feature attributions for explaining ranked lists.
translated by 谷歌翻译
Recently, Smart Video Surveillance (SVS) systems have been receiving more attention among scholars and developers as a substitute for the current passive surveillance systems. These systems are used to make the policing and monitoring systems more efficient and improve public safety. However, the nature of these systems in monitoring the public's daily activities brings different ethical challenges. There are different approaches for addressing privacy issues in implementing the SVS. In this paper, we are focusing on the role of design considering ethical and privacy challenges in SVS. Reviewing four policy protection regulations that generate an overview of best practices for privacy protection, we argue that ethical and privacy concerns could be addressed through four lenses: algorithm, system, model, and data. As an case study, we describe our proposed system and illustrate how our system can create a baseline for designing a privacy perseverance system to deliver safety to society. We used several Artificial Intelligence algorithms, such as object detection, single and multi camera re-identification, action recognition, and anomaly detection, to provide a basic functional system. We also use cloud-native services to implement a smartphone application in order to deliver the outputs to the end users.
translated by 谷歌翻译
In recent years, we have seen a significant interest in data-driven deep learning approaches for video anomaly detection, where an algorithm must determine if specific frames of a video contain abnormal behaviors. However, video anomaly detection is particularly context-specific, and the availability of representative datasets heavily limits real-world accuracy. Additionally, the metrics currently reported by most state-of-the-art methods often do not reflect how well the model will perform in real-world scenarios. In this article, we present the Charlotte Anomaly Dataset (CHAD). CHAD is a high-resolution, multi-camera anomaly dataset in a commercial parking lot setting. In addition to frame-level anomaly labels, CHAD is the first anomaly dataset to include bounding box, identity, and pose annotations for each actor. This is especially beneficial for skeleton-based anomaly detection, which is useful for its lower computational demand in real-world settings. CHAD is also the first anomaly dataset to contain multiple views of the same scene. With four camera views and over 1.15 million frames, CHAD is the largest fully annotated anomaly detection dataset including person annotations, collected from continuous video streams from stationary cameras for smart video surveillance applications. To demonstrate the efficacy of CHAD for training and evaluation, we benchmark two state-of-the-art skeleton-based anomaly detection algorithms on CHAD and provide comprehensive analysis, including both quantitative results and qualitative examination.
translated by 谷歌翻译
One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .
translated by 谷歌翻译
This paper presents an algorithm that relies on a series of dense and deep neural networks for passive microwave retrieval of precipitation. The neural networks learn from coincidences of brightness temperatures from the Global Precipitation Measurement (GPM) Microwave Imager (GMI) with the active precipitating retrievals from the Dual-frequency Precipitation Radar (DPR) onboard GPM as well as those from the {CloudSat} Profiling Radar (CPR). The algorithm first detects the precipitation occurrence and phase and then estimates its rate, while conditioning the results to some key ancillary information including parameters related to cloud microphysical properties. The results indicate that we can reconstruct the DPR rainfall and CPR snowfall with a detection probability of more than 0.95 while the probability of a false alarm remains below 0.08 and 0.03, respectively. Conditioned to the occurrence of precipitation, the unbiased root mean squared error in estimation of rainfall (snowfall) rate using DPR (CPR) data is less than 0.8 (0.1) mm/hr over oceans and land. Beyond methodological developments, comparing the results with ERA5 reanalysis and official GPM products demonstrates that the uncertainty in global satellite snowfall retrievals continues to be large while there is a good agreement among rainfall products. Moreover, the results indicate that CPR active snowfall data can improve passive microwave estimates of global snowfall while the current CPR rainfall retrievals should only be used for detection and not estimation of rates.
translated by 谷歌翻译
Disentanglement of constituent factors of a sensory signal is central to perception and cognition and hence is a critical task for future artificial intelligence systems. In this paper, we present a compute engine capable of efficiently factorizing holographic perceptual representations by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing and the intrinsic stochasticity associated with analog in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least five orders of magnitude larger problems that cannot be solved otherwise, while also significantly lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix-vector multiply operations are executed at O(1) thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to factorize visual perceptual representations reliably and efficiently.
translated by 谷歌翻译
近年来,由于深度学习体系结构的有希望的进步,面部识别系统取得了非凡的成功。但是,当将配置图像与额叶图像的画廊匹配时,它们仍然无法实现预期的准确性。当前方法要么执行姿势归一化(即额叶化)或脱离姿势信息以进行面部识别。相反,我们提出了一种新方法,通过注意机制将姿势用作辅助信息。在本文中,我们假设使用注意机制姿势参加的信息可以指导剖面面上的上下文和独特的特征提取,从而进一步使嵌入式域中的更好表示形式学习。为了实现这一目标,首先,我们设计了一个统一的耦合曲线到额定面部识别网络。它通过特定于类的对比损失来学习从面孔到紧凑的嵌入子空间的映射。其次,我们开发了一个新颖的姿势注意力块(PAB),以专门指导从剖面面上提取姿势 - 不合稳定的特征。更具体地说,PAB旨在显式地帮助网络沿着频道和空间维度沿着频道和空间维度的重要特征,同时学习嵌入式子空间中的歧视性但构成不变的特征。为了验证我们提出的方法的有效性,我们对包括多PIE,CFP,IJBC在内的受控和野生基准进行实验,并在艺术状态下表现出优势。
translated by 谷歌翻译
装有传感器,执行器和电子控制单元(ECU)的现代车辆可以分为几个称为功能工作组(FWGS)的操作子系统。这些FWG的示例包括发动机系统,变速箱,燃油系统,制动器等。每个FWG都有相关的传感器通道,可以衡量车辆操作条件。这种丰富的数据环境有利于预测维护(PDM)技术的开发。削弱各种PDM技术的是需要强大的异常检测模型,该模型可以识别出明显偏离大多数数据的事件或观察结果,并且不符合正常车辆操作行为的明确定义的概念。在本文中,我们介绍了车辆性能,可靠性和操作(VEPRO)数据集,并使用它来创建一种基于多阶段的异常检测方法。利用时间卷积网络(TCN),我们的异常检测系统可以达到96%的检测准确性,并准确预测91%的真实异常。当利用来自多个FWG的传感器通道时,我们的异常检测系统的性能会改善。
translated by 谷歌翻译
当有足够的训练数据时,在某些视力任务中,基于变压器的模型(例如Vision Transformer(VIT))可以超越跨趋化神经网络(CNN)。然而,(CNN)对视力任务(即翻译均衡和局部性)具有强大而有用的归纳偏见。在这项工作中,我们开发了一种新颖的模型架构,我们称之为移动鱼类地标检测网络(MFLD-NET)。我们已经使用基于VIT的卷积操作(即斑块嵌入,多层感知器)制作了该模型。 MFLD-NET可以在轻巧的同时获得竞争性或更好的结果,同时轻巧,因此适用于嵌入式和移动设备。此外,我们表明MFLD-NET可以在PAR上获得关键点(地标)估计精度,甚至比FISH图像数据集上的某些最先进的(CNN)更好。此外,与VIT不同,MFLD-NET不需要预训练的模型,并且在小型数据集中训练时可以很好地概括。我们提供定量和定性的结果,以证明该模型的概括能力。这项工作将为未来开发移动但高效的鱼类监测系统和设备的努力奠定基础。
translated by 谷歌翻译
在本文中,我们试图在抽象嵌入空间中绘制额叶和轮廓面图像之间的连接。我们使用耦合编码器网络利用此连接将额叶/配置文件的面部图像投影到一个常见的潜在嵌入空间中。提出的模型通过最大化面部两种视图之间的相互信息来迫使嵌入空间中表示的相似性。拟议的耦合编码器从三个贡献中受益于与极端姿势差异的匹配面。首先,我们利用我们的姿势意识到的对比学习来最大程度地提高身份额叶和概况表示之间的相互信息。其次,由在过去的迭代中积累的潜在表示组成的内存缓冲区已集成到模型中,因此它可以比小批量大小相对较多的实例。第三,一种新颖的姿势感知的对抗结构域适应方法迫使模型学习从轮廓到额叶表示的不对称映射。在我们的框架中,耦合编码器学会了扩大真实面孔和冒名顶替面部分布之间的边距,这导致了相同身份的不同观点之间的高度相互信息。通过对四个基准数据集的广泛实验,评估和消融研究来研究拟议模型的有效性,并与引人入胜的最新算法进行比较。
translated by 谷歌翻译